Skip to content

Conversation

@pbibra
Copy link
Collaborator

@pbibra pbibra commented Oct 15, 2025

Summary

This PR updates GraphBuilder and GraphConfig to align with the new span <> metadata configuration and moves certain post-processing logic into Astra for correctness.

GraphConfig

  • Combines span tags based on a configurable delimiter to populate a metadata field.

GraphBuilder

  • Generates a full list of nodes and edges when no span filter is provided.
  • When a filter is present, performs an iterative DFS to remove intermediate nodes that don’t meet filter conditions.
  • Filter logic is OR-based — any span satisfying at least one condition is retained in the final list of dependencies

Post-Processing within Astra
Added a post-processing DFS step during dependency generation for specific span filters. This logic was originally planned to run outside Astra, but that approach lost the span hierarchy required to properly drop intermediate spans and reconnect relevant nodes.

Requirements

Testing

▲ ~ curl -G 'http://localhost:8081/api/v1/trace/eZVwBjNwRg6LFefMEg-nRw==/subgraph' \
    --data-urlencode 'buildFilter={"operation_name": ["dropwizard.request"]}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4782  100  4782    0     0   5993      0 --:--:-- --:--:-- --:--:--  5992
{
  "subgraph": {
    "edges": [
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "437a9f05c78b14cedc3ea8f9bafae0838e0875b9dc2f4b76ba0bb36cd4c88f3f"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "cdb6c98c6b46df0b7d0e7964f80bd6b22423f2252c6264e4a89c7355055f1dab"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "7bf016d85a57cad93435953f7d833755b5248bb0dc603cc7382e9134acf628a8"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "4089069a670cc89865fe21e203382df2f13dc4f9e7289a3fbc0b4bf7a16b74d0"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "08c22953c68f7f7bcfd183f864913dffc6999b53d4f3f07471afd94e1c533230"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "2eb3447cb192a9da9cd1dd284cbcad4c4dd6f636584d17957dd7139cde08cd1a"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "f33ff1bde4908a63b85e0734beb74a5ea212cf96ac396daea4474dc004be0a7d"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "356788092a336ce0c8aa235241e877f0e65715ba0ac23084549045f63fe8fb35"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "8b91da8ba8fb5d00480270dc8735123c198f08c496e7223d43309378d2a23b79"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "e9d30c07602571ff1fe8a0b2e10cd2f117ee4256291dd638385e90e11c20a198"
      },
      {
        "metadata": {
          "operation": "dropwizard.request"
        },
        "sourceNodeId": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "targetNodeId": "c7301c3fb1c58566b2aa2bd26eaa03d2d0997232f1d666a94e1835afd9941da3"
      }
    ],
    "nodes": [
      {
        "id": "96a2b9678eb1121c7896d7153a5cb5d4ee0fecd1773173b2cbb6e0c06a2a9297",
        "metadata": {
          "resource": "//{version:(v2|v3)}/viaduct/scope/data-api{operation:(/[a-zA-Z0-9_-]+)?}",
          "service": "viaduct-dispatcher-production.viaduct-dispatcher-production"
        }
      },
      {
        "id": "cdb6c98c6b46df0b7d0e7964f80bd6b22423f2252c6264e4a89c7355055f1dab",
        "metadata": {
          "resource": "viaduct/SillaViaductSuperhostQuery",
          "service": "viaduct-shard-14-production.viaduct-production"
        }
      },
      {
        "id": "8b91da8ba8fb5d00480270dc8735123c198f08c496e7223d43309378d2a23b79",
        "metadata": {
          "resource": "viaduct/DayuMessageThreadQuery",
          "service": "viaduct-shard-180-production.viaduct-production"
        }
      },
      {
        "id": "e9d30c07602571ff1fe8a0b2e10cd2f117ee4256291dd638385e90e11c20a198",
        "metadata": {
          "resource": "viaduct/DayuMessageThreadQuery",
          "service": "viaduct-shard-8-production.viaduct-production"
        }
      },
      {
        "id": "437a9f05c78b14cedc3ea8f9bafae0838e0875b9dc2f4b76ba0bb36cd4c88f3f",
        "metadata": {
          "resource": "viaduct/PocariMediaCollectionQuery",
          "service": "viaduct-shard-15-production.viaduct-production"
        }
      },
      {
        "id": "f33ff1bde4908a63b85e0734beb74a5ea212cf96ac396daea4474dc004be0a7d",
        "metadata": {
          "resource": "viaduct/SillaViaductGuestFavoriteQuery",
          "service": "viaduct-shard-5-production.viaduct-production"
        }
      },
      {
        "id": "c7301c3fb1c58566b2aa2bd26eaa03d2d0997232f1d666a94e1835afd9941da3",
        "metadata": {
          "resource": "viaduct/PocariMediaCollectionQuery",
          "service": "viaduct-shard-13-production.viaduct-production"
        }
      },
      {
        "id": "356788092a336ce0c8aa235241e877f0e65715ba0ac23084549045f63fe8fb35",
        "metadata": {
          "resource": "viaduct/DayuGetThreadsByUserIdQuery",
          "service": "viaduct-shard-4-production.viaduct-production"
        }
      },
      {
        "id": "4089069a670cc89865fe21e203382df2f13dc4f9e7289a3fbc0b4bf7a16b74d0",
        "metadata": {
          "resource": "viaduct/DayuMessageThreadQuery",
          "service": "viaduct-shard-1-production.viaduct-production"
        }
      },
      {
        "id": "08c22953c68f7f7bcfd183f864913dffc6999b53d4f3f07471afd94e1c533230",
        "metadata": {
          "resource": "viaduct/AirlockRiskCheckQuery",
          "service": "viaduct-shard-20-production.viaduct-production"
        }
      },
      {
        "id": "2eb3447cb192a9da9cd1dd284cbcad4c4dd6f636584d17957dd7139cde08cd1a",
        "metadata": {
          "resource": "viaduct/DayuGetThreadsByUserIdQuery",
          "service": "viaduct-shard-10-production.viaduct-production"
        }
      },
      {
        "id": "7bf016d85a57cad93435953f7d833755b5248bb0dc603cc7382e9134acf628a8",
        "metadata": {
          "resource": "viaduct/SillaViaductUserPastTripsQuery",
          "service": "viaduct-shard-2-production.viaduct-production"
        }
      }
    ]
  },
  "subgraphBuildTimeMs": 31,
  "traceFetchTimeMs": 474
}
 ▲ ~ curl -G 'http://localhost:8081/api/v1/trace/eZVwBjNwRg6LFefMEg-nRw==/subgraph' \
    --data-urlencode 'buildFilter={"operation_name": ["http.request"]}' | jq
{
  "subgraph": {
    "edges": [
      {
        "metadata": {
          "operation": "http.request"
        },
        "sourceNodeId": "4089069a670cc89865fe21e203382df2f13dc4f9e7289a3fbc0b4bf7a16b74d0",
        "targetNodeId": "cc71230f648249a1ce3f4c1852a811fb9f76ce2c59f58aba602003e1a66796ec"
      },
      {
        "metadata": {
          "operation": "http.request"
        },
        "sourceNodeId": "e9d30c07602571ff1fe8a0b2e10cd2f117ee4256291dd638385e90e11c20a198",
        "targetNodeId": "af223332f46eb1260a3b5f61ee0fda386685f29505e6f2c2f537536e315f6edc"
      },
      ...
    ],
    "nodes": [
      {
        "id": "bfd1faa3abcbeec2fe2e7871474a4ca283d945fe3eb686f4819e4e14f8c85700",
        "metadata": {
          "resource": "panda/AvailabilityBatchQuery",
          "service": "panda-production.panda-production"
        }
      },
      {
        "id": "f1a283b61ca5ddf47b706990c9b5b79b0a4437e13456cb04fc91863d55ab4691",
        "metadata": {
          "resource": "/Pocari/getRichMessages",
          "service": "pocari-production.pocari-production"
        }
      },
    ...
    ]
  },
  "subgraphBuildTimeMs": 85,
  "traceFetchTimeMs": 419
}

@pbibra pbibra marked this pull request as ready for review October 20, 2025 19:31
* @param spanIdToSpans Lookup map from span ID to spans
* @param parentSpanIdToChildSpans Map from parent span ID to list of child spans
*/
private void dfsFilter(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could probably do less work. It looks like it's traversing the full child graph from each filtered node for each filtered node, resulting in some duplicated work.

If you composed the algorithm a little differently, you could do it with oneish traversal total using an approach like this:

filtered-nodes = [...]
edges = []
for n in filtered-nodes:
  transitive-matching-children = collect-transitive-matching-children(n, filtered-nodes)
  for c in transitive-matching-children:
    edges << create-edge(n, c)

def collect-transitive-matching-children(n, filtered-nodes):
  results = []-no
  work=[n] 
  while work not empty:
    current = work.pop
    for c in get-children(current):
      if c not in filtered-nodes:
        work.push(c)
      else:
        results.push(c)

You could add a visited check too in case there could be cycles, but I think that'll be more of a possibility in the derived graph than the original.

Comment on lines +66 to +67
return options.entrySet().stream()
.filter(entry -> entry.getValue() != null && !entry.getValue().isEmpty())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could move doing this filter into the constructor, instead of doing it for each span

spanIdToSpans.put(span.getId(), span);

String parentId = span.getParentId();
if (parentId != null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think -1 is also a value that means missing. We might want to clean that up, but you might want to check for it.

}
}

long start = System.currentTimeMillis();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use io.micrometer.core.instrument.Timer for this, then you could wire it up as a metric in addition to reporting it in the response.

@pbibra pbibra merged commit a209270 into airbnb-main Nov 7, 2025
2 checks passed
@pbibra pbibra deleted the pbibra-fix-graph-building-logic branch November 7, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants